SQL based frequent pattern mining
نویسنده
چکیده
Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. Frequent pattern mining is a foundation of several essential data mining tasks. These facts motivated us to develop original SQL-based approaches for mining frequent patterns. In this work, we investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table. Most of them adopt Apriori-like approaches. However those methods may suffer from the inferior performance since the costly candidate-generation-and-test operation especially when mining datasets with prolific patterns and/or long patterns. We develop a class of efficient SQL based pattern growth methods for mining frequent patterns. The commonality of these approaches is that they use a divide and conquer method to decompose mining tasks and then use a pattern growth method to avoid the combinatory problem inherent to candidate-generation-and-test approach. Apriori algorithms with the help of SQL either require several scans over the data or require many and complex joins between the input tables. While our SQL-based algorithms avoid making multiple passes over the large original input table and complex joins between the tables. A comprehensive performance study evaluates on DBMS (IBM DB2 UDB EEE V8) and compares the performance results between SQL based frequent pattern mining approaches based on Apriori and the approaches in this thesis. The empirical results show that our algorithms can get efficient performance. Moreover, recently
منابع مشابه
Comparative Analysis of Various Approaches Used in Frequent Pattern Mining
Frequent pattern mining has become an important data mining task and has been a focused theme in data mining research. Frequent patterns are patterns that appear in a data set frequently. Frequent pattern mining searches for recurring relationship in a given data set. Various techniques have been proposed to improve the performance of frequent pattern mining algorithms. This paper presents revi...
متن کاملSQL Based Frequent Pattern Mining with FP-Growth
Scalable data mining in large databases is one of today’s real challenges to database research area. The integration of data mining with database systems is an essential component for any successful largescale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set gen...
متن کاملShaping SQL-Based Frequent Pattern Mining Algorithms
Integration of data mining and database management systems could significantly ease the process of knowledge discovery in large databases. We consider implementations of frequent itemset mining algorithms, in particular pattern-growth algorithms similar to the top-down FP-growth variations, tightly coupled to relational database management systems. Our implementations remain within the confines...
متن کاملEfficient Frequent Pattern Mining in Relational Databases
Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. We investigate approaches based on SQL for t...
متن کاملA Survey Paper on Frequent Pattern Mining for Uncertain Database
There are number of existing algorithms proposed that mines frequent patterns from certain or precise data. But know a day’s demand of uncertain data mining is increased. There are many situations in which data are uncertain. For frequent pattern mining from uncertain data mainly two approaches are proposed that are level-wise approach and pattern-growth approach. Level-wise approach use the ge...
متن کامل